A brief introduction

The Martian Chronicles is a science fiction fix-up novel written by Ray Bradbury in 1950.

The novel is written as a chronicle and each story is a chapter within an overall chronological ordering of the plot.

It was not specifically written to be a single since its creation as a novel was suggested by a publisher after most of the stories had already appeared.

The book treats the exploration and settlement of Mars by Americans leaving a troubled Earth which it will be devastated by a nuclear war.

“Bradbury is an authentic original.” - Time magazine

The Martian Chronicles book

The Martian Chronicles book

The analyzed version book is the trade paperback edition published by Bantam Books with illustrations by Ian Miller.

It is composed of 26 chapters. Note that the complete version has 28 chapters, but it is not available since two chapters weren’t considered for this novel version.

These chapters are:

  • November 2002: The Fire Balloons
  • May 2003: The Wilderness

The first one was omitted for the ambiguous religious interpretation and the second one since it seems out of context.

Word frequency

Let’s use wordcloud package in order to create a representation of the most frequency words used in The Martian Chronicles.

The word frequencies are explained with this pattern: if the word has a large font size and a significant bold type, it means that it has an high frequency.

It has a circle depiction, where the word with the highest frequency is in the centre.

After that, the words with a similar frequency, besides the alike font size and bold type, they have also the same color.

Word frequency

Word frequency

The highest frequency word is captain, and it is normal since the captain is who drives the rocket and manages the expeditions between Earth and Mars.

Moreover, the captain is one of the most important characters since it performs the most important decisions in order to success the goal of the expedition.

Note that the captain changes depending on the expedition, so it is not the same individual for all the chapters.

Bigram analysis

Let’s analyse the bigrams in order to find the main characters in The Martian Chronicles. We will treat bigrams as terms, in the same way that we analyzed individual words.

We can find the characters appearing in the chapters, thus we can understand if some chapters are linked together.

Remember that The Martian Chronicles consists of short stories previously published, which the common context is the exploration and colonization of Mars by Americans and the explicit common thread is the time series expressed on the chapters.

Bigram analysis

Bigram analysis

The character appeared in more chapters is captain Williams, the smart leader of the second expedition.

He comes out in these chapters:

  • August 1999: THE EARTH MEN
  • March 2000: THE TAXPAYER
  • April 2000: THE THIRD EXPEDITION
  • June 2001: –AND THE MOON BE STILL AS BRIGHT

These four chapters are consecutive and they are located at the beginning of the book.

Bigram analysis

The chapter with the most number of bigrams is April 2000: THE THIRD EXPEDITION.

Besides, we can note that three bigrams refer to the same character, which it is the captain John Black, the leader of the third expediction:

  • john black
  • captain john
  • captain black

He appears in two consecutive chapters: April 2000: THE THIRD EXPEDITION and June 2001: –AND THE MOON BE STILL AS BRIGHT.

Pairwise correlation

Bigram analysis is a useful tool to explore pairs of adjacent words. In our case study, it is used to find the most important characters of the book.

Nevertheless, we are also interested to find words that tend to appear in particular sections, even if they don’t occur next to each other as bigrams.

Pairwise correlation

Let’s pick four interesting words with the help of the previously word frequency analysis, which they are: captain, Earth, Mars and martian. We will find the words most associated with them.

Pairwise correlation

Captain term is associated with the name and surname of the main characters, after that Earth and Mars with words around the the settlement and the aim of the exploration.

Moreover, the martian term is associated with Tomás Gomez, which he appears in the chapter August 2002: Night Meeting and he met several martians and there are some dialogs between them.

Pairwise correlation graph

Let’s visualize the network in order to see the overall correlation pattern. Remember that the relationships are symmetrical, rather than directional as in bigrams.

Besides, let’s highlight in blue three of the words picked previously in order to understand if they are also some of the most correlated words.

The other words are painted in orange.

Pairwise correlation graph

Pairwise correlation graph

The graph contains a giant component, which it is a connected component containing a significant part of all the nodes.

Note that the only one of the previously selected words is captain, which it is interconnected with several words belonging to the pairwise correlation graph.

Let’s use some centrality measurements in order to find the most interesting words on the pairwise correlation graph.

We will use the betweenness centrality and the PageRank centrality.

Centrality measurements

Betweenness centrality measures the extent to which a vertex lies on paths between other vertices. Vertices with high betweenness may have considerable influence within a network by virtue of their control over information passing between others.

PageRank centrality is based on PageRank, an algorithm used by Google Search where its thesis consists of claiming that a node is important if it linked from other important and link parsimonious nodes or if it is highly linked.

Centrality measurements

Let’s apply the betweeness centrality.

Centrality measurements

Let’s apply the PageRank centrality.

Centrality measurements

In addition to be the most used word in the book, captain is also an important word since it is on the top of the rankings in both the centrality measurements chosen. This is due since the captain is the main character in of both Martian’s Chronicles.

Let’s focus now on the teece term which it appears on the top two of the rankings in both centrality measurements chosen.

Centrality measurements

The teece term refers to Samuel Teece, a racist and terrorist white store owner which he appears in the chapter June 2003: WAY IN THE MIDDLE OF THE AIR.

This chapter is focused on a contemporary political problem, the racism, prejudice, and discrimination in America.

Remember that the historical time period is before the Civil Rights movement. This chapter in some versions of the book is omitted for racial language (i.e. use of n-word).

“Bradbury is one of the very few authors who dared to consider the effects and consequences of race in America at a time when racism was sanctioned by the culture.” - Isiah Lavender III

Community detection

“Community detection is the problem of finding the natural divisions of a network into groups of vertices, called communities, such that there are many edges within groups and few edges between groups.”

Let’s focus the detection on four important words selected using the previously centrality measurements, which they are:

  • captain
  • teece
  • car
  • elma

Community detection

Who is Elma?

Elma is Sam Parkhill’s wife. Her husband wants to go to Mars just to set up a hot dog stand, but this claim seems so strange.

Emma knows the truth about him, but we don’t ever know what Elma wants, why she married Sam, how she feels about Mars.

Note that the relationship between them is really weird since Sam menaces to kill her.

Community detection

Correlation between chapters

Previously we found the most used bigrams which they were the main characters in the book.

Let’s understand if there is something more tied between these chapters than just the presence of common characters.

Thus, let’s visualize the correlation plot of the eight chapters selected during this previous phase.

Correlation between chapters

Correlation between chapters

There is a high correlation between these chapters:

  • August 1999: THE EARTH MEN
  • April 2000: THE THIRD EXPEDITION
  • June 2001: –AND THE MOON BE STILL AS BRIGHT

These chapters describe three different expeditions to Mars done by different american crews, leaded by dissimilar captains. These expeditions have in common the ending: they failed since every american crew was killed by the martians, included the captain of each expedition.

After that, the first two chapters named previously also have an high correlation with this chapter: April 2026: THE LONG YEARS.

Correlation between chapters

August 1999: THE EARTH MEN & April 2026: THE LONG YEARS

The common tie is the insanity. In chapter August 1999: THE EARTH MEN, some martians became vulnerable since using their telepathy against the humans have an unintended consequence of mental insanity.

In the other chapter, humans use robots to have around them their family lost in war, but this choice brought a sort of madness on them, since they can’t accept the death of their loved ones.

Correlation between chapters

April 2000: THE THIRD EXPEDITION & April 2026: THE LONG YEARS

They have in common the concept of mask. In chapter April 2000: THE THIRD EXPEDITION, martians masked themself as crew’s human relatives in order to swindle and then kill the crew, included the captain.

In the other chapter, humans used robots as tools for having in “life” their family, lost during the war. Thus, the robots are used to placate their nostalgia.

Sentiment analysis

Let’s address the topic of opinion mining in order to under the sentiments and emotions in the book.

We know that the book doesn’t have so positive parts, since the exploration and settlement of Mars are the common contents.

Let’s use these general-purpose lexicons:

  • bing: used to positive and negative sentiments

  • nrc: useful to recognize eight basic emotions

Sentiment analysis

Let’s find the 10 most frequent words with sentiment content:

Sentiment analysis

The most used sentiment word is dead, which it is obviously a negative word. This is due since the invasions and the nuclear war caused a lot of deceaseds.

The second sentiment most used word is hot, which in a sentimental context it is a positive word.

Nevertheless in the book, it is usually used to indicate a high thermal state or as part of the bigram hot dog.

Thus, hot is a false friend.

Sentiment analysis

Let’s find the most negative chapter:

Sentiment analysis

The most negative chapter is August 2001: THE SETTLERS.

The chapter explains the history of the first settlers, but they were actually lonely ones, because they fell already bad at the beginning of the trip, since they start to have regrets and they were filled with loneliness and nostalgia.

Sentiment analysis

Let’s find the most positive chapter:

Sentiment analysis

The most positive chapter is October 2002: THE SHORE.

This chapter concerns the transportation of humans to Mars by a lot of rockets in order to escape from the nuclear war.

Several lifes were saved thanks to this carriage by rockets.

Nevertheless, the rockets came from the USA, so only american humans were saved from the war.

Topic modeling

A topic model is used for discovering the abstract topics that occur in a collection of documents, so the chapters of the book.

Topic modeling will help us to discovery the hidden semantic structure of the chapters where the topics are clusters of similar words.

Topic modeling

Let’s use LDA, which it is a method for fitting a topic model. It handles each chapter as a mixture of topics, and each topic as a mixture of words.

For each chapter we will select the most present topic, in order to understand easily the main topic for each of them.

The LDA model has two principles:

  • every topic is a mixture of words

  • each document is a mixture of topics

Topic modeling

Every topic is a mixture of words

This principle provides a method for extracting the per-topic-per-word probabilities, called beta from the model.

Let’s extract three topics from the chapters and then let’s 10 words that are the most common in each topic.

Topic modeling

Topic modeling

After a deep analysis, we can suppose that these are the hidden topics:

  • the hate and disdain of humans on martians and other humans
  • the family and Earth nostalgia
  • the larks on expeditions and explorations on Mars

Topic modeling

Each document is a mixture of topics

LDA permits also to model each chapter as a mixture of topics.

We can examine the per-document-per-topic probabilities, called gamma.

Using the three topics found previously, let’s visualize the most present topic for each chapter.

Topic modeling

Topic modeling

By using the gamma, we can see that most of the chapters have a clear topic.

Then, we can see that the third topic is mostly treated at the beginning of the book and the the second topic from the middle until the end of the book.

After that, the first topic appears nearly periodically in the book.

Draw someone’s own conclusions

This deep analysis on The Martian Chronicles book permits us to understand better the content and the sentiments transmitted by the book.

We initially extracted the shallow topics using some basic text mining techniques and then more hidden subjects using advanced text mining techniques as the LDA model.

Furthermore, we also had the opportunity to find the ties between the chapters of the book, which it was really interesting since the stories of the chapters had already appeared before the creation of the book, so the bonds between them weren’t that clear.